Investigating Stroke-Level Information for Learning Chinese Word Embeddings
نویسندگان
چکیده
We propose a novel method for learning Chinese word embeddings. Different from previous approaches, we investigate the effectiveness of the Chinese stroke-level information when learning Chinese word embeddings. Empirically, our model consistently outperforms several state-of-the-art methods, including skipgram, cbow, GloVe and CWE, on the standard word similarity and word analogy tasks.
منابع مشابه
Improve Chinese Word Embeddings by Exploiting Internal Structure
Recently, researchers have demonstrated that both Chinese word and its component characters provide rich semantic information when learning Chinese word embeddings. However, they ignored the semantic similarity across component characters in a word. In this paper, we learn the semantic contribution of characters to a word by exploiting the similarity between a word and its component characters ...
متن کاملRadical-Based Hierarchical Embeddings for Chinese Sentiment Analysis at Sentence Level
Text representation in Chinese sentiment analysis is usually working at word or character level. In this paper, we prove that radical-level processing could greatly improve sentiment classification performance. In particular, we propose two types of Chinese radical-based hierarchical embeddings. The embeddings incorporate not only semantics at radical and character level, but also sentiment inf...
متن کاملLearning Sense-specific Word Embeddings By Exploiting Bilingual Resources
Recent work has shown success in learning word embeddings with neural network language models (NNLM). However, the majority of previous NNLMs represent each word with a single embedding, which fails to capture polysemy. In this paper, we address this problem by representing words with multiple and sense-specific embeddings, which are learned from bilingual parallel data. We evaluate our embeddi...
متن کاملWord and Document Embeddings based on Neural Network Approaches
Data representation is a fundamental task in machine learning. The representation of data affects the performance of the whole machine learning system. In a long history, the representation of data is done by feature engineering, and researchers aim at designing better features for specific tasks. Recently, the rapid development of deep learning and representation learning has brought new inspi...
متن کاملMulti-Granularity Chinese Word Embedding
This paper considers the problem of learning Chinese word embeddings. In contrast to English, a Chinese word is usually composed of characters, and most of the characters themselves can be further divided into components such as radicals. While characters and radicals contain rich information and are capable of indicating semantic meanings of words, they have not been fully exploited by existin...
متن کامل